Sparse partial least squares regression for on-line variable selection with multivariate data streams

نویسندگان

  • Brian McWilliams
  • Giovanni Montana
چکیده

Data streams arise in several domains. For instance, in computational finance, several statistical applications revolve around the real-time discovery of associations between a very large number of co-evolving data feeds representing asset prices. The problem we tackle in this paper consists of learning a linear regression function from multivariate input and output streaming data in an incremental fashion while also performing dimensionality reduction and variable selection. When input and output streams are high-dimensional and correlated, it is plausible to assume the existence of hidden factors that explain a large proportion of the covariance between them. The methods we propose build on recursive partial least squares (PLS) regression. The hidden factors are dynamically inferred and tracked over time and, within each factor, the most important streams are recursively identified by means of sparse matrix decompositions. Moreover, the recursive regression model is able to adapt to sudden changes in the data generating mechanism and also identifies the number of latent factors. Extensive simulation results illustrate how the methods perform and compare with alternative penalized regression models for streaming data. We also apply the algorithm to solve a multivariate version of the enhanced index tracking problem in computational finance.  2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 170–193, 2010

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse partial least squares for on-line variable selection in multivariate data streams

In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition ...

متن کامل

Predictive modeling with high-dimensional data streams: an on-line variable selection approach

In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition ...

متن کامل

Sparse partial least squares regression for simultaneous dimension reduction and variable selection

Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very lar...

متن کامل

Expression quantitative trait loci mapping with multivariate sparse partial least squares regression.

Expression quantitative trait loci (eQTL) mapping concerns finding genomic variation to elucidate variation of expression traits. This problem poses significant challenges due to high dimensionality of both the gene expression and the genomic marker data. We propose a multivariate response regression approach with simultaneous variable selection and dimension reduction for the eQTL mapping prob...

متن کامل

An Introduction to the ‘spls’ Package, Version 1.0

This vignette provides basic information about the ‘spls’ package. SPLS stands for “Sparse Partial Least Squares”. The SPLS regression methodology is developed in [1]. The main principle of this methodology is to impose sparsity within the context of partial least squares and thereby carry out dimension reduction and variable selection simultaneously. SPLS regression exhibits good performance e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistical Analysis and Data Mining

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2010